Towards the Lemmatisation of Polish Nominal Syntactic Groups Using a Shallow Grammar
نویسنده
چکیده
While morphological analysers and taggers usually assign lemmata to wordforms, those tools focus on single words. For some tasks a tool that lemmatises (and thus normalises) whole phrases would be more appropriate. The paper presents, discusses and evaluates a set of tools to lemmatise nominal groups, based on a shallow grammar for Polish. The tools reach an overall success rate of over 58%, and almost 83% on the nominal groups that are correctly recognised by the grammar. The approach should be portable to other languages, especially those morphologically rich.
منابع مشابه
Morphological and Syntactic Processing for Text Retrieval
This article describes the application of lemmatization and shallow parsing as a linguistically-based alternative to stemming in Text Retrieval, with the aim of managing linguistic variation at both word level and phrase level. Several alternatives for selecting the index terms among the syntactic dependencies detected by the parser are evaluated. Though this article focuses on Spanish, this ap...
متن کاملThe Design of Syntactic Annotation Levels in the National Corpus of Polish
This paper presents the procedure of the syntactic annotation of the National Corpus of Polish. Syntactic annotation consists here of shallow parsing and manual post-editing of the results by annotators. The description concentrates on the delimitation of syntactic words and groups, as well as on problems encountered during the annotation process.
متن کاملDetection of Nested Mentions for Coreference Resolution in Polish
This paper describes the results of creating a shallow grammar of Polish capable of detecting multi-level nested nominal phrases, intended to be used as mentions in coreference resolution tasks. The work is based on existing grammar developed for the National Corpus of Polish and evaluated on manually annotated Polish Coreference Corpus.
متن کاملTowards a Lexicon-Grammar of Polish: Extraction of Verbo-Nominal Collocations from Corpora
In the paper we present a contribution to the SyntLex longterm-project aiming at a lexicon-grammar for Polish. A corpus-based method is presented for computer-assisted improvement or/and verification of verbo-nominal lexicongrammars (in application to Polish). Feasibility study.
متن کاملSyntactic Lexicon of Polish Predicative Nouns
In the paper we report realization of SyntLex project aiming at construction of a full lexicon grammar for Polish. The lexicongrammar based paradigm in computer linguistics is derived from the predicate logic and attributes a central role to the predicative constructions. An important class of syntactic constructions in many languages (French, English, Polish and other Slavonic languages in par...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011